Skip to content

Fix flat reader subrange decode reuse#8596

Open
lukekim wants to merge 2 commits into
vortex-data:developfrom
spiceai:lukim/8587-regression
Open

Fix flat reader subrange decode reuse#8596
lukekim wants to merge 2 commits into
vortex-data:developfrom
spiceai:lukim/8587-regression

Conversation

@lukekim

@lukekim lukekim commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Fixes #8587.

Summary

  • Memoize FlatReader's decoded array future so synthetic subrange scans share the decoded flat segment instead of issuing repeated decode work.
  • Add regression coverage for the query patterns that regressed after perf(scan): intra-file decode parallelism — sub-split large chunk spans #8400, including projection-only, filter-only, filtered projection, computed projection, string filtered projection, and string filtered computed projection cases.

Validation

  • cargo nextest run -p vortex-layout -E 'test(layouts::flat::reader)'
  • cargo nextest run -p vortex-layout
  • cargo clippy -p vortex-layout --all-targets --all-features
  • git diff --check
  • cargo bench --workspace
  • Re-ran SQL benches with /opt/homebrew/bin/uv 0.11.24; core suites passed: Appian, TPCH, TPCDS, ClickBench, ClickBench sorted, FineWeb, and GH Archive via direct binary rerun. PolarSignals/StatPopGen exposed pre-existing benchmark-definition/runtime backend failures, and bare Public BI requires --opt dataset=<name>.

Signed-off-by: Luke Kim <80174+lukekim@users.noreply.github.com>
@codspeed-hq

codspeed-hq Bot commented Jun 26, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 5 improved benchmarks
❌ 3 regressed benchmarks
✅ 1581 untouched benchmarks
⏩ 4 skipped benchmarks1

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation chunked_bool_canonical_into[(1000, 10)] 15.9 µs 26.7 µs -40.29%
Simulation chunked_varbinview_into_canonical[(1000, 10)] 169.1 µs 205.8 µs -17.83%
Simulation slice_empty_vortex 310 ns 368.3 ns -15.84%
Simulation bitwise_not_vortex_buffer_mut[128] 273.6 ns 215.3 ns +27.1%
Simulation bitwise_not_vortex_buffer_mut[1024] 333.9 ns 275.6 ns +21.17%
Simulation bitwise_not_vortex_buffer_mut[2048] 427.8 ns 369.4 ns +15.79%
Simulation chunked_varbinview_canonical_into[(100, 100)] 259.6 µs 224.5 µs +15.65%
Simulation chunked_varbinview_into_canonical[(100, 100)] 306.8 µs 271.9 µs +12.84%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing spiceai:lukim/8587-regression (0b86845) with develop (bdbf6c4)

Open in CodSpeed

Footnotes

  1. 4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: Subsplitting large chunks causes some regression for vortex-compact for some benchmarks

1 participant